Equivalence of Queries with Nested Aggregation

نویسنده

  • David DeHaan
چکیده

Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization. In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Equivalence , Containment and Rewriting of Aggregate Queries

The primary goal of this thesis is to lay the theoretical foundations for a formal study of aggregate query optimization. This requires gaining a coherent understanding of equivalences and containments between aggregate queries of varied forms. A secondary goal of this thesis is to solve the view usability problem for varied types of aggregate queries. The view usability problem is that of dete...

متن کامل

Nested Queries in Object Bases

[12] W. Kiessling. SQL-like and Quel-like correlation queries with aggregates revisited. 12 representing the result of f applied to the set of elements of e whose attribute a is equal to m. The above equivalence applied to the query yields: Note that the max function can be computed in a single scan (linear time) for max g;m;a;f if f is linear. Also note that an equivalent treatment for min can...

متن کامل

Equivalence and Normal Forms for the Restricted and Bounded Fixpoint in the Nested Algebra

The nested model is an extension of the traditional, \\at" relational model in which relations can also have relation-valued entries. Its \default" query language, the nested algebra, is rather weak, unfortunately, since it is only a conservative extension of the traditional, \\at" relational algebra, and thus can only express a small fraction of the polynomial-time queries. Therefore, it was p...

متن کامل

Verifying Equivalence of Spark Programs

Apache Spark is a popular framework for writing large scale data processing applications. Our long term goal is to develop automatic tools for reasoning about Spark programs. This is challenging because Spark programs combine database-like relational algebraic operations and aggregate operations, corresponding to (nested) loops, with User Defined Functions (UDFs). In this paper, we present a no...

متن کامل

Expressivity and Complexity of MongoDB Queries

In this paper, we consider MongoDB, a widely adopted but not formally understood database system managing JSON documents and equipped with a powerful query mechanism, called the aggregation framework. We provide a clean formal abstraction of this query language, which we call MQuery. We study the expressivity of MQuery, showing the equivalence of its well-typed fragment with nested relational a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009